Threads & Concurrency
Overview
- A thread is a basic unit of CPU utilization; it comprises a thread ID, a program counter (PC), a register set, and a stack.
- It shares with other threads belonging to the same process its code section, data section, and other operating-system resources, such as open files and signals
Motivation
- Process creation is time consuming and resource intensive.It is generally more efficient to use one process that contains multiple threads
- Most operating system kernels are also typically multithreaded.
- Many applications can also take advantage of multiple threads, including basic sorting, trees, and graph algorithms.
Benefits
- Responsiveness:Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.
- Resource sharing:threads share the memory and the resources of the process to which they belong by default
- Economy:in general thread creation consumes less time and memory than process creation. Additionally, context switching is typically faster between threads than between processes.
- Scalability:The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores
Multicore Programming
- multiple computing cores on a single processing chip where each core appears as a separate CPU to the operating system.We refer to such systems as multicore, and multithreaded programming provides a mechanism for more efficient use of these multiple computing cores and improved concurrency
- A concurrent system supports more than one task by allowing all the tasks to make progress. In contrast, a parallel system can perform more than one task simultaneously
Programming Challenges
- Identifying tasks:This involves examining applications to find areas that can be divided into separate, concurrent tasks. Ideally, tasks are independent of one another and thus can run in parallel on individual cores.
- Balance:While identifying tasks that can run in parallel, programmers must also ensure that the tasks perform equal work of equal value
- Data splitting:Just as applications are divided into separate tasks, the data accessed and manipulated by the tasks must be divided to run on separate cores
- Data dependency:The data accessed by the tasks must be examined for dependencies between two or more tasks
- Testing and debugging:When a program is running in parallel on multiple cores, many different execution paths are possible
Types of Parallelism
- Data parallelism:Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core.
- Task parallelism:Task parallelism involves distributing not data but tasks (threads) across multiple computing cores.Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data
- Fundamentally, then, data parallelism involves the distribution of data across multiple cores, and task parallelism involves the distribution of tasks across multiple cores
- data and task parallelism are not mutually exclusive.
Multithreading Models
- support for threads may be provided either at the user level, for user threads, or by the kernel, for kernel threads
- User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the operating system
Many-to-One Model
- The many-to-one model maps many user-level threads to one kernel thread.
- because only one thread can access the kernel at a time, multiple threads are unable to run in parallel on multicore systems.
One-to-One Model
- The one-to-one model maps each user thread to a kernel thread. It provides more concurrency
- The only drawback to this model is that creating a user thread requires creating the corresponding kernel thread, and a large number of kernel threads may burden the performance of a system.
- Linux, along with the family of Windows operating systems, implement the one-to-one model
Many-to-Many Model
- The many-to-many model multiplexes many user-level threads to a smaller or equal number of kernel threads
- developers can create as many user threads as necessary, and the corresponding kernel threads can run in parallel on a multiprocessor. Also, when a thread performs a blocking system call, the kernel can schedule another thread for execution
- One variation on the many-to-many model still multiplexes many userlevel threads to a smaller or equal number of kernel threads but also allows a user-level thread to be bound to a kernel thread. This variation is sometimes referred to as the two-level model
- Although the many-to-many model appears to be the most flexible of the models discussed, in practice it is difficult to implement.
- In addition, with an increasing number of processing cores appearing on most systems, limiting the number of kernel threads has become less important. As a result, most operating systems now use the one-to-one model.
Thread Libraries
- A thread library provides the programmer with an API for creating and managing threads.
- There are two primary ways of implementing a thread library.
- The first approach is to provide a library entirely in user space with no kernel support.This means that invoking a function in the library results in a local function call in user space and not a system call.
- The second approach is to implement a kernel-level library supported directly by the operating system.Invoking a function in the API for the library typically results in a system call to the kernel.
- Three main thread libraries are in use today:
- POSIX Pthreads:the threads extension of the POSIX standard, may be provided as either a user-level or a kernel-level library.UNIX, Linux, and macOS systems typically use Pthreads.
- Windows :a kernel-level library available on Windows systems.
- Java:Java thread API is generally implemented using a thread library available on the host system
- For POSIX and Windows threading, any data declared globally—that is, declared outside of any function—are shared among all threads belonging to the same process. Because Java has no equivalent notion of global data, access to shared data must be explicitly arranged between threads.
- Two general strategies for creating multiple threads
- asynchronous threading:the parent and child execute concurrently and independently of one another.there is typically little data sharing between them.
- synchronous threading:the threads created by the parent perform work concurrently, but the parent cannot continue until this work has been completed. Typically, synchronous threading involves significant data sharing among threads.
Pthreads
- Pthreads refers to the POSIX standard (IEEE 1003.1c) defining an API for thread creation and synchronization. This is a specification for thread behavior, not an implementation
- the program has two threads: the initial (or
parent) thread in main() and the
summation (or child) thread performing the summation
operation in the runner() function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
int sum; /* this data is shared by the thread(s) */
void *runner(void *param);/* threads call this function */
int main(int argc, char *argv[])
{
pthread_t tid; /* the thread identifier */
pthread_attrt attr; /* set of thread attributes */
/* set the default attributes of the thread */
pthread_attr init (&attr);
/* create the thread */
pthread_create(&tid,&attr,runner,argv[1]);
/* wait for the thread to exit */
pthread join (tid, NULL);
printf ("sum = %d\n",sum);
}
/* The thread will execute in this function */
void *runner(void *param)
{
int i,upper = atoi (Param);
sum = 0;
for (i = 1; i <= upper; i++)
sum += i;
pthread_exit(O);
}
Windows Threads
1 | #include <windows.h> |
Java Threads
- Thread creation in Java involves creating a Thread object and passing it an instance of a class that implements Runnable, followed by invoking the start() method on the Thread object.
- we call the start() method, and it calls the run() method on our behalf
Java Executor Framework
Additionally, this approach separates the creation of threads from the results they produce: rather than waiting for a thread to terminate before retrieving results, the parent instead only waits for the results to become available # Implicit Threading
- One way to address these difficulties and better support the design of concurrent and parallel applications is to transfer the creation and management of threading from application developers to compilers and run-time libraries.This strategy, termed implicit threading
- these strategies generally require application developers to identify tasks—not threads—that can run in parallel. A task is usually written as a function
- The advantage of this approach is that developers only need to identify parallel tasks, and the libraries determine the specific details of thread creation and management.
Thread Pools
- The general idea behind a thread pool is to create a number of threads at start-up and place them into a pool, where they sit and wait for work
- . Thread pools work well when the tasks submitted to the pool can be executed asynchronously.
- Thread pools offer these benefits
- Servicing a request with an existing thread is often faster than waiting to create a thread.
- A thread pool limits the number of threads that exist at any one point.
- Separating the task to be performed from the mechanics of creating the task allows us to use different strategies for running the task
- The Windows API related to thread pools.(skip)
Java Thread Pools
- Single thread executor—newSingleThreadExecutor()—creates a pool of size 1.
- Fixed thread executor—newFixedThreadPool(int size)—creates a thread pool with a specified number of threads.
- Cached thread executor—newCachedThreadPool()—creates an unbounded thread pool, reusing threads in many instances.
Fork Join
- the main parent thread creates (forks) one or more child threads and then waits for the children to terminate and join with it, at which point it can retrieve and combine their results
- . In some ways, this fork-join model is a synchronous version of thread pools
Fork Join in Java
- Java introduced a fork-join library in Version 1.7 of the API that is designed to be used with recursive divide-and-conquer algorithms such as Quicksort and Mergesort
OpenMP
- OpenMP is a set of compiler directives as well as an API for programs written in C, C++, or FORTRAN that provides support for parallel programming in shared memory environments.
- OpenMP identifies parallel regions as blocks of code that may run in parallel
- When OpenMP encounters the directive # pragma omp parallel.it creates as many threads as there are processing cores in the system.All the threads then simultaneously execute the parallel region. As each thread exits the parallel region, it is terminated.
Grand Central Dispatch
- Grand Central Dispatch (GCD) is a technology developed by Apple for its macOS and iOS operating systems. It is a combination of a run-time library, an API, and language extensions that allow developers to identify sections of code (tasks) to run in parallel. Like OpenMP, GCD manages most of the details of threading.
- GCD schedules tasks for run-time execution by placing them on a dispatch queue. GCD identifies two types of dispatch queues: serial and concurrent.
- Each process has its own serial queue (known as its main queue)
Intel Thread Building Blocks
- Intel threading building blocks (TBB) is a template library that supports designing parallel applications in C++.
Threading Issues
The fork() and exec() System Calls
- Some UNIX systems have chosen to have two versions of fork(), one that duplicates all threads and another that duplicates only the thread that invoked the fork() system call.
- if a thread invokes the exec() system call, the program specified in the parameter to exec() will replace the entire process—including all threads.
- the separate process does not call exec() after forking, the separate process should duplicate all threads.If exec() is called immediately after forking, then duplicating all threads is unnecessary, only the calling thread is appropriate.
Signal Handling
- A signal is used in UNIX systems to notify a process that a particular event has occurred
- follow the same pattern:
- A signal is generated by the occurrence of a particular event
- The signal is delivered to a process.
- Once delivered, the signal must be handled
- A signal may be handled by one of two possible
handlers
- A default signal handler
- A user-defined signal handler
- Every signal has a default signal handler that the kernel runs when handling that signal. This default action can be overridden by a user-define signal handler that is called to handle the signal
- a signal be delivered for the following options
- Deliver the signal to the thread to which the signal applies.
- Deliver the signal to every thread in the process.
- Deliver the signal to certain threads in the process.
- Assign a specific thread to receive all signals for the process
- synchronous signals need to be delivered to the thread causing the signal and not to other threads in the process.Some asynchronous signals should be sent to all threads.
- an asynchronous signal may be delivered only to those threads that are not blocking it. However, because signals need to be handled only once, a signal is typically delivered only to the first thread found that is not blocking it.
- Windows:asynchronous procedure calls (APCs)
Thread Cancellation
- Thread cancellation involves terminating a thread before it has completed.
- A thread that is to be canceled is often referred to as the
target thread.Cancellation of a target thread may occur
in two different scenarios:
- Asynchronous cancellation. One thread immediately terminates the target thread.canceling a thread asynchronously may not free a necessary system-wide resource.
- Deferred cancellation. The target thread periodically checks whether it should terminate, allowing it an opportunity to terminate itself in an orderly fashion.The thread can perform this check at a point at which it can be canceled safely.
- The default cancellation type is deferred cancellation. However, cancellation occurs only when a thread reaches a cancellation point
- Pthreads allows a function known as a cleanup handler to be invoked if a thread is canceled. This function allows any resources a thread may have acquired to be released before the thread is terminated.
Thread-Local Storage
- in some circumstances, each thread might need its own copy of certain data. We will call such data thread-local storage (or TLS)
- It is easy to confuse TLS with local variables. However, local variables are visible only during a single function invocation, whereas TLS data are visible across function invocations.
Scheduler Activations
- Many systems implementing either the many-to-many or the two-level model place an intermediate data structure between the user and kernel threads. This data structure—typically known as a lightweight process, or LWP
- One scheme for communication between the user-thread library and the kernel is known as scheduler activation
- The kernel provides an application with a set of virtual processors (LWPs), and the application can schedule user threads onto an available virtual processor. Furthermore, the kernel must inform an application about certain events. This procedure is known as an upcall. Upcalls are handled by the thread library with an upcall handler, and upcall handlers must run on a virtual processor.
Operating-System Examples
Windows Threads
- Windows uses the one-to-one mapping
- The general components of a thread include:
- A thread ID uniquely identifying the thread
- A register set representing the status of the processor
- A program counter
- A user stack, employed when the thread is running in user mode, and a kernel stack, employed when the thread is running in kernel mode
- A private storage area used by various run-time libraries and dynamic link libraries (DLLs)
- The register set, stacks, and private storage area are known as the context of the thread
- The primary data structures of a thread include:
- ETHREAD—executive thread block:ETHREAD include a pointer to the process to which the thread belongs and the address of the routine in which the thread starts control. The ETHREAD also contains a pointer to the corresponding KTHREAD
- KTHREAD—kernel thread block:The KTHREAD includes scheduling and synchronization information for the thread. In addition, the KTHREAD includes the kernel stack (used when the thread is running in kernel mode) and a pointer to the TEB.
- TEB—thread environment block:The TEB is a user-space data structure that is accessed when the thread is running in user mode. Among other fields, the TEB contains the thread identifier, a user-mode stack, and an array for thread-local storage.
Linux Threads
- Unlike many other operating systems, Linux does not distinguish between processes and threads; instead, it refers to each as a task. The Linux clone() system call can be used to create tasks that behave either more like processes or more like threads.